Learn how to use the Bayesian average for a more reliable and accurate rank star rating average.
avg_star_rating
and bayes_avg
respectively, under each item.
By putting Item A at the top, the left side’s ranking is both misleading and unsatisfying. The ranking on the right, based on the Bayesian average, reflects a better balance of rating and quantity of ratings. This example shows how the Bayesian average lowered item A’s average to 4.3 because it measured A’s 10 ratings against B and C’s much larger numbers of ratings. As described later, the Bayesian average left Items B and C unchanged because the Bayesian average affects items with low rating counts much more then those that have more ratings.
In sum, by relativizing ratings in this way, the Bayesian average creates a more reliable comparison between products. It ensures that products with lower numbers of ratings have less weight in the ranking. What follows is a description of the Bayesian average and how to code it.
m
)C
).m
is a straightforward arithmetic average for all products: the sum of all ratings divided by the count of quantity of ratings.
Calculating C
requires a bit more math. This tutorial calculates C
based on the distribution of the rating counts for each product, where C
is equal to the 25% percentile (= the lower quartile). For example, suppose a store has 100 products. To compute C
, you take all the products and sort them by the quantity of ratings each has. Some have 10 ratings and others have 100 or 1000 ratings. Once sorted, you find the product at the 25% position on the sorted list and look at how many ratings it has. This is the lower quartile for C
. For simplicity, this guide sets C = 100
.
Thus, if you calculate the overall average rating (m
) of the store’s catalog to be 3.5, the Bayesian average uses both of these values (m = 3.5
and C = 100
) to adjust the arithmetic average. It does this using the following formula:
Here’s the same formula with the example numbers plugged in:
m
and C
) and the Bayesian average itself. It also discusses when to calculate these values.
avg_stars_rating
)bayes_average
), this can be empty or 0
to startratings_count
)bayes_average
attribute. The purpose of the following code is to calculate the value for bayes_average
.Additionally, the sample dataset doesn’t show other attributes, such as the description of the product, price, item specifications, etc.m
and C
represent the two Bayesian constants. In this code, they’re assigned the values from the preceding section (m = 3.5
and C = 100
):
C
constant. As suggested in the preceding section, you can use a lower quartile % that corresponds to the 25% percentile. You can calculate this value using the following SQL function:
avg_stars_rating
and ratings_count
match the attributes in the index that represent, respectively, the rating average and quantity of ratings for each product:
browse
method.m
and C
), you can create a batch job that runs once a week or month. These constants don’t need to change that often.
bayes_average
attribute you’ve added in each of your records as a custom ranking value, as seen in the following image:
profit_margin
acts as a secondary tie-breaker for the primary Bayesian average ranking: